Flow Map

Pon Anu Reka

2023-10-26

1 Objective

The objective of this notebook is to use the Lorenz attractor to determine the dynamics of the system and to gain insights into the behavior and characteristics of a chaotic system.

2 Experimental setup

In this section we aim to investigate how the Lorenz system’s state transitions from one state to another at each time stamp. By studying these state transitions, we seek to identify patterns, trends, and potential underlying dynamics within the system. This analysis can provide valuable insights into the behavior and evolution of the system, enabling us to understand its characteristics and potentially make predictions or interpretations based on the observed patterns.

We follow the following steps to perform this experiment:

3 Data understanding

The Lorenz attractor is a three-dimensional figure that is generated by a set of differential equations that model a simple chaotic dynamic system of convective flow. Lorenz Attractor arises from a simplified set of equations that describe the behavior of a system involving three variables. These variables represent the state of the system at any given time and are typically denoted by (x, y, z). The equations are as follows:

\[ dx/dt = σ*(y-x) \] \[ dy/dt = x*(r -z)-y \] \[ dz/dt = x*y-β*z \] where dx/dt, dy/dt, and dz/dt represent the rates of change of x, y, and z respectively over time (t). σ, r, and β are constant parameters of the system, with σ(σ = 10) controlling the rate of convection, r(r=28) controlling the difference in temperature between the convective and stable regions, and β(β = 8/3) representing the ratio of the width to the height of the convective layer. When these equations are plotted in three-dimensional space, they produce a chaotic trajectory that never repeats. The Lorenz attractor exhibits sensitive dependence on initial conditions, meaning even small differences in the starting conditions can lead to drastically different trajectories over time. This sensitivity to initial conditions is a defining characteristic of chaotic systems.

In this section, we will use the Lorenz Attractor Dataset. This dataset contains 200 thousand observations and 5 columns.The dataset can be downloaded from here

The dataset includes the following columns:

Here, we load the data. Let’s explore the Lorenz Attractor Dataset. For the sake of brevity we are displaying first ten rows.

dataset <- read.csv("./sample_dataset/lorenze_attractor.csv")
dataset <- dataset %>% dplyr::select(X,Y,Z,U,t)
dataset$t <- round(dataset$t, 5)
DT::datatable(head(dataset,10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')

3.1 Training dataset

Let’s have a look at the Training dataset containing 160,000 data points. For the sake of brevity we are displaying first 10 rows.

noOfPoints <- dim(dataset)[1]
trainLength <- as.integer(noOfPoints * 0.8)
trainDataset <- dataset[1:trainLength,]
trainData <- trainDataset %>% select(X,Y,Z)
DT::datatable(head(trainDataset,10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')

Now, let us analyse the summary of training dataset.


df <- do.call(cbind, lapply(trainDataset, summary)) %>% 
  data.frame() %>%
  tibble::rownames_to_column("Metrics")

DT::datatable(df %>% 
                mutate_if(is.numeric, round,4) %>% 
                head(),
    options  = options,
    rownames = FALSE)

3.2 Test dataset

Let’s have a look at the Test dataset containing 40,000 data points. For the sake of brevity we are displaying first 10 rows.

testDataset <- dataset[(trainLength+1):noOfPoints,]
DT::datatable(head(testDataset, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')

Now let us analyse the Summary of test dataset.

df_test <- do.call(cbind, lapply(testDataset, summary)) %>% 
  data.frame() %>%
  tibble::rownames_to_column("Metrics")

DT::datatable(df_test %>% 
                mutate_if(is.numeric, round,4) %>% 
                head(),
    options  = options,
    rownames = FALSE)

4 Lorenz attractor (3D Space)

When the Lorenz attractor is visualized in a three-dimensional space, it forms a complex and intricate structure. It consists of a set of looping and spiraling curves that are confined within a specific region. The attractor has a butterfly-like shape, with two large wings and a narrow body connecting them.

Now let’s try to visualize the Lorenz attractor (overlapping spirals) in 3D Space.


data_3d <- dataset[sample(1:nrow(dataset), 1000), ]
plot_3d <- plotly::plot_ly(data_3d, x= ~X, y= ~Y, z = ~Z) %>% add_markers( marker = list(
                          size = 2,
                          symbol = "circle",
                          color = ~Z,
                          colorscale = "Bluered",
                          colorbar = (list(title = 'z_var'))))
plot_3d

Figure 1: Lorenz attractor in 3D space

5 Lorenz attractor (2D Space)

We will use the HVT function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression upto atleast 80%. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the quantization error threshold or increasing the number of cells and then rerunning the HVT function again.

We will pass the below mentioned model parameters along with torus dataset to HVT function and see if the desired compression percentage is achieved.

Model Parameters

Let’s have a look at the Train dataset containing 160,000 data points. For the sake of brevity we are displaying first 10 rows. Here, we are not including the U and t column from the entire dataset, so that compression takes place only for the X, Y, Z coordinates and not for U(velocity) and t(Timestamp). After training, we merge back the U and t column with the dataset for prediction

DT::datatable(head(trainData, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')

Now, let us analyse the structure of training dataset.

str(trainData)
#> 'data.frame':    160000 obs. of  3 variables:
#>  $ X: num  0 0.0025 0.00499 0.00747 0.00995 ...
#>  $ Y: num  1 1 1 0.999 0.999 ...
#>  $ Z: num  20 20 20 20 19.9 ...


set.seed(240)
hvt.results <- HVT::HVT(
  trainData,
  n_cells = 100,
  depth = 1,
  quant.err = 0.1,
  projection.scale = 10,
  normalize = T,
  distance_metric = "L1_Norm",
  error_metric = "max",
  quant_method = "kmeans"
)

Let’s checkout the compression summary .

compressionSummaryTable(hvt.results[[3]]$compression_summary)
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 0 0 n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans

For better visualisation, let’s plot the Voronoi tessellation for 100 cells.

Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’lorenz attractor’

Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’lorenz attractor’

6 Prediction

Let’s have a look at the dataset we use for prediction which contains 200,000 data points. For the sake of brevity we are displaying first 10 rows.

DT::datatable(head(dataset, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')

Now, let us analyse the structure of dataset we use for prediction.

str(dataset)
#> 'data.frame':    200000 obs. of  5 variables:
#>  $ X: num  0 0.0025 0.00499 0.00747 0.00995 ...
#>  $ Y: num  1 1 1 0.999 0.999 ...
#>  $ Z: num  20 20 20 20 19.9 ...
#>  $ U: num  0 0.0005 0.001 0.0015 0.002 ...
#>  $ t: num  0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...

Now once we have built the model, let us try to predict using our validation dataset which cell and which level each point belongs to.

set.seed(240)
predictions <- HVT::predictHVT(
  dataset,
  hvt.results,
  child.level = 1,
  line.width = c(1.2),
  color.vec = c("#141B41"),
  quant.error.hmap = 0.1,
  n_cells.hmap = 100
)

The Flow Map functions mentioned in the next section requires Cell ID from prediction output and sorted Tiemstamp from the dataset we used for prediction. So we merge them both to get a modified data frame that pairs cell IDs with their respective timestamps.

Let’s see which cell and level each point belongs to with the sorted Tiemstamp. For the sake of brevity, we will only show the first 10 rows


scored_data <- predictions[["scoredPredictedData"]] %>%
  round(2) %>% cbind(dataset) %>% 
  as.data.frame()
colnames(scored_data) <- c("Segment.Level", "Segment.Parent", "Segment.Child", "n", "Cell.ID", "Quant.Error", "pred_X", "pred_Y", "pred_Z", "centroidRadius", "diff", "anomalyFlag", "X", "Y", "Z", "U", "t")
DT::datatable(head(scored_data, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')
cat("\n")

7 Functions for flow map visualizations

7.1 Function to create timeseries plot



# **Description - It serves as a tool for exploring and understanding temporal patterns and transitions in the data**
# 
# This state_transition_plot function is designed to visualize and analyze sequential data representing state transitions. It takes as input a dataset with state information over time and generates different types of plots based on user preferences. Users can choose to create a timeseries plot of state transitions or a timeseries with lines connecting the state transitions. Additionally, the function allows for data sampling to focus on specific time periods.
# 
# **Usage**
# 
# > state_transition_plot(df, sample_size = 0.2, line_plot = FALSE, cellid_column = "Cell.ID", time_column = "t")
# 
# **Arguments**
#
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **sample_size** (numeric) - Need to specify the sampling value which ranges between 0.1 to 1. The highest value 1, outputs a plot with the entire dataset. Sampling of data takes place from the last
# * @param **line_plot** (logical) - If TRUE, the output will be a timeseries plot with a line connecting the states according to the sample_size otherwise, a timeseries plot but without a line based on the sample_size will be the output
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function


state_time_plot_result <- state_transition_plot(df = scored_data, cellid_column = "Cell.ID", time_column = "t")
state_time_plot_result


7.2 Function to create transition probability tables

This function displays probability with Tplus1 states for all every cell ID in the form of table. For the sake of brevity we are displaying the probability table for the Cell ID 1


# **Description - It is useful for analyzing and visualizing state transition patterns in a dataset**
# 
# The get_transition_probability_table function calculates transition probabilities for distinct states within a specified column of a dataframe (df). It computes the likelihood of transitioning from one state to another in sequential rows and presents the results as data frames in a list. Each data frame contains information about the next state (Tplus1_States), the frequency of this transition (Frequency), and the calculated transition probability (Probability). Additionally, the function displays these probability tables for each unique state and stores them in a global variable named trans_prob_df
# 
# **Usage**
# 
# > get_transition_probability_table(df, cellid_column = "Cell.ID", time_column = "t")
# 
# **Arguments**
# 
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
get_transition_probability_table(df = scored_data, cellid_column = "Cell.ID", time_column = "t")

7.2.1 Cell ID 1

Probability table for Cell ID 1 :

7.2.2 Cell ID 2

Probability table for Cell ID 2 :

7.2.3 Cell ID 3

Probability table for Cell ID 3 :

7.2.4 Cell ID 4

Probability table for Cell ID 4 :

7.2.5 Cell ID 5

Probability table for Cell ID 5 :

7.2.6 Cell ID 6

Probability table for Cell ID 6 :

7.2.7 Cell ID 7

Probability table for Cell ID 7 :

7.2.8 Cell ID 8

Probability table for Cell ID 8 :

7.2.9 Cell ID 9

Probability table for Cell ID 9 :

7.2.10 Cell ID 10

Probability table for Cell ID 10 :

7.2.11 Cell ID 11

Probability table for Cell ID 11 :

7.2.12 Cell ID 12

Probability table for Cell ID 12 :

7.2.13 Cell ID 13

Probability table for Cell ID 13 :

7.2.14 Cell ID 14

Probability table for Cell ID 14 :

7.2.15 Cell ID 15

Probability table for Cell ID 15 :

7.2.16 Cell ID 16

Probability table for Cell ID 16 :

7.2.17 Cell ID 17

Probability table for Cell ID 17 :

7.2.18 Cell ID 18

Probability table for Cell ID 18 :

7.2.19 Cell ID 19

Probability table for Cell ID 19 :

7.2.20 Cell ID 20

Probability table for Cell ID 20 :

7.2.21 Cell ID 21

Probability table for Cell ID 21 :

7.2.22 Cell ID 22

Probability table for Cell ID 22 :

7.2.23 Cell ID 23

Probability table for Cell ID 23 :

7.2.24 Cell ID 24

Probability table for Cell ID 24 :

7.2.25 Cell ID 25

Probability table for Cell ID 25 :

7.2.26 Cell ID 26

Probability table for Cell ID 26 :

7.2.27 Cell ID 27

Probability table for Cell ID 27 :

7.2.28 Cell ID 28

Probability table for Cell ID 28 :

7.2.29 Cell ID 29

Probability table for Cell ID 29 :

7.2.30 Cell ID 30

Probability table for Cell ID 30 :

7.2.31 Cell ID 31

Probability table for Cell ID 31 :

7.2.32 Cell ID 32

Probability table for Cell ID 32 :

7.2.33 Cell ID 33

Probability table for Cell ID 33 :

7.2.34 Cell ID 34

Probability table for Cell ID 34 :

7.2.35 Cell ID 35

Probability table for Cell ID 35 :

7.2.36 Cell ID 36

Probability table for Cell ID 36 :

7.2.37 Cell ID 37

Probability table for Cell ID 37 :

7.2.38 Cell ID 38

Probability table for Cell ID 38 :

7.2.39 Cell ID 39

Probability table for Cell ID 39 :

7.2.40 Cell ID 40

Probability table for Cell ID 40 :

7.2.41 Cell ID 41

Probability table for Cell ID 41 :

7.2.42 Cell ID 42

Probability table for Cell ID 42 :

7.2.43 Cell ID 43

Probability table for Cell ID 43 :

7.2.44 Cell ID 44

Probability table for Cell ID 44 :

7.2.45 Cell ID 45

Probability table for Cell ID 45 :

7.2.46 Cell ID 46

Probability table for Cell ID 46 :

7.2.47 Cell ID 47

Probability table for Cell ID 47 :

7.2.48 Cell ID 48

Probability table for Cell ID 48 :

7.2.49 Cell ID 49

Probability table for Cell ID 49 :

7.2.50 Cell ID 50

Probability table for Cell ID 50 :

7.2.51 Cell ID 51

Probability table for Cell ID 51 :

7.2.52 Cell ID 52

Probability table for Cell ID 52 :

7.2.53 Cell ID 53

Probability table for Cell ID 53 :

7.2.54 Cell ID 54

Probability table for Cell ID 54 :

7.2.55 Cell ID 55

Probability table for Cell ID 55 :

7.2.56 Cell ID 56

Probability table for Cell ID 56 :

7.2.57 Cell ID 57

Probability table for Cell ID 57 :

7.2.58 Cell ID 58

Probability table for Cell ID 58 :

7.2.59 Cell ID 59

Probability table for Cell ID 59 :

7.2.60 Cell ID 60

Probability table for Cell ID 60 :

7.2.61 Cell ID 61

Probability table for Cell ID 61 :

7.2.62 Cell ID 62

Probability table for Cell ID 62 :

7.2.63 Cell ID 63

Probability table for Cell ID 63 :

7.2.64 Cell ID 64

Probability table for Cell ID 64 :

7.2.65 Cell ID 65

Probability table for Cell ID 65 :

7.2.66 Cell ID 66

Probability table for Cell ID 66 :

7.2.67 Cell ID 67

Probability table for Cell ID 67 :

7.2.68 Cell ID 68

Probability table for Cell ID 68 :

7.2.69 Cell ID 69

Probability table for Cell ID 69 :

7.2.70 Cell ID 70

Probability table for Cell ID 70 :

7.2.71 Cell ID 71

Probability table for Cell ID 71 :

7.2.72 Cell ID 72

Probability table for Cell ID 72 :

7.2.73 Cell ID 73

Probability table for Cell ID 73 :

7.2.74 Cell ID 74

Probability table for Cell ID 74 :

7.2.75 Cell ID 75

Probability table for Cell ID 75 :

7.2.76 Cell ID 76

Probability table for Cell ID 76 :

7.2.77 Cell ID 77

Probability table for Cell ID 77 :

7.2.78 Cell ID 78

Probability table for Cell ID 78 :

7.2.79 Cell ID 79

Probability table for Cell ID 79 :

7.2.80 Cell ID 80

Probability table for Cell ID 80 :

7.2.81 Cell ID 81

Probability table for Cell ID 81 :

7.2.82 Cell ID 82

Probability table for Cell ID 82 :

7.2.83 Cell ID 83

Probability table for Cell ID 83 :

7.2.84 Cell ID 84

Probability table for Cell ID 84 :

7.2.85 Cell ID 85

Probability table for Cell ID 85 :

7.2.86 Cell ID 86

Probability table for Cell ID 86 :

7.2.87 Cell ID 87

Probability table for Cell ID 87 :

7.2.88 Cell ID 88

Probability table for Cell ID 88 :

7.2.89 Cell ID 89

Probability table for Cell ID 89 :

7.2.90 Cell ID 90

Probability table for Cell ID 90 :

7.2.91 Cell ID 91

Probability table for Cell ID 91 :

7.2.92 Cell ID 92

Probability table for Cell ID 92 :

7.2.93 Cell ID 93

Probability table for Cell ID 93 :

7.2.94 Cell ID 94

Probability table for Cell ID 94 :

7.2.95 Cell ID 95

Probability table for Cell ID 95 :

7.2.96 Cell ID 96

Probability table for Cell ID 96 :

7.2.97 Cell ID 97

Probability table for Cell ID 97 :

7.2.98 Cell ID 98

Probability table for Cell ID 98 :

7.2.99 Cell ID 99

Probability table for Cell ID 99 :

7.2.100 Cell ID 100

Probability table for Cell ID 100 :

7.3 Function to reconcile transition probability using markovchain package


# **Description - It is used to generate and visualize transition probability matrices for state data**
# 
# The reconcile_transition_probability function computes and visualizes transition probabilities for state data, it calculates transition probabilities between consecutive states in the input dataset, both with and without self-transitions. The function generates heatmap visualizations for these transition probabilities, providing insights into state transitions over time. It performs Markov Chain analysis on the data, producing transition matrices with and without self-transitions, along with corresponding heatmaps.
# 
# **Usage**
# 
# > reconcile_transition_probability(df, hmap_type = "All", cellid_column = "Cell.ID", time_column = "Timestamp")
# 
# **Arguments**
# 
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **hmap_type** (character) - If set to without_self_state, reconciliation plots for manual and Markovchain for highest transition probability excluding the self-state is given as output, if set to with_self_state, reconciliation plots for manual and Markovchain for highest transition probability considering the self-state is given as output and if set to All, plots including and excluding self-state is given as output
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function

reconcile_plots <- reconcile_transition_probability(df = scored_data, hmap_type = "All", cellid_column = "Cell.ID", time_column = "t")

7.3.1 Manual reconciliation of transition probability with self-state

The darker diagonal cells indicate higher probabilities of staying in the same state. These transitions represent situations where there is no change from the current state to the next state. Such states might be attractors in a dynamic system, where the system naturally tends to return to these states even after minor perturbations.









7.3.2 Manual reconciliation of transition probability without self-state

In this plot, the transitions suggest that the states tend to move to neighboring states more frequently. Proximity might not only refer to physical distance but also to similarities in attributes or conditions.










7.3.3 Markovchain reconciliation of transition probability with self-state

This heatmap uses the same data from the manual reconciliation process to determine the probability using self-state using the markovchainFit function.








7.3.4 Markovchain reconciliation of transition probability without self-state

This heatmap uses the same data from the manual reconciliation process to determine the probability using self-state using the markovchainFit function.







7.4 Function to create flowmap visualizations

# **Description - It is designed for creating and visualizing flow maps based on input data**
# 
# The generate_flow_maps function in R extracts centroid coordinates and probability data from input. It generates two types of flow maps, one based on the second-highest probability and another on the highest probability, using arrows to represent state transitions. Additionally, it offers optional animations to visualize transitions over time, either sorted by timestamps or based on the next state. Users can customize the type of maps and animations they want to create for exploring state transitions in their data.
# 
# **Usage**
# 
# > generate_flow_maps(hvt_model_output, transition_probability_df, hvt_plot_output, df, animation = "All", flow_map = "All", animation_speed = 2, threshold = 0.6, cellid_column = "Cell.ID", time_column = "t")
# 
# 
# **Arguments**
# 
# * @param **hvt_model_output** (list) - It is an output list in hierarchy from hvt model training. To get the centroid coordinates, we retrieve the second element, then within the second element, obtain the 1st element's `1`. And to get the Cell IDs, we retrieve the third element, then within the third element, obtain the Cell.ID from summary
# * @param **transition_probability_df** (dataframe) - A list of dataframes which is the output from the get_transition_probability_table function
# * @param **hvt_plot_output** (list) - Base plot for the flow maps
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **animation** (character) - If set to time_based, dot animation for state transition with sorted Timestamp is the output. If set to state_based, arrow animation based on highest state excluding self-state will be the output. If set to All, both the animation will be resulted
# * @param **flow_map** (character) - If set to self_state, dot flowmap for next state based on highest transition probability will be the output. If set to without_self_state, arrow flowmap with arrow-size based on the distance between the two states pointing to next state based on highest transition probability excluding self-state probability will be the output. If set to probability, arrow flowmap with arrow-size based on their probability pointing to next state based on highest transition probability excluding self-state will be the output. If set to All, all three flowmaps will be resulted
# * @param **animation_speed** (numeric) - Must be numeric value and a factor of 100
# * @param **threshold** (numeric) - It ranges between 0.1 to 1. This numeric variable is used to control the categorization of probability values into "High Probability" and "Low Probability" for the flow map type "Probability"
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function

source("../R/flowmap.R")
plots <- generate_flow_maps(hvt_model_output = hvt.results, transition_probability_df = trans_prob_df, hvt_plot_output = hvt.plot, df = scored_data, animation = "All", flow_map = "All", animation_speed = 2, threshold = 0.7, cellid_column = "Cell.ID", time_column = "t")


7.4.1 Flow map: Highest transition probability without considering self-state

Arrow lengths on the below Flow map is based on the distance between current and next state. The metric used to calculate the distance is Euclidean Distance

7.4.2 Flow map: Highest transition probability considering self-state

Circle around the centroid represents self-state Probability

7.4.3 Flow Map: Highest transition probability excluding self-states - Arrow size represents transition probability

Arrow segment length based on Probability

7.4.4 Flow map animation: Highest state transition probabilities (Including self-states)

7.4.5 Flow Map animation: Highest state transition probabilities (Excluding self-states)